
****divnearestsep weights****
*This file generates distributing(!) inventorweights by technology (division) for the simulation; 
*based on the unique countries of origin mentioned in patent applications per patent family made by that firm. 
*For missing years, we interpolate. 

*Note: Distributing weights are NOT used in the main analysis of the paper. There we simply distribute the patent counts to the countries and sum up per country
*to obtain the spillovers. Because we model here and counts may be different, especially in years where in reality no patent inventors stem
*from a country, we need to create weights and interpolate missing ones. We then distribute the simulated patents to the countries and sum up.

*All years (per company) where we have data on the technology are contained in the initial file loaded (countries in columns). 
*We then interpolate the missing years (where no patents of that technology have been created at all for a firm).
*At the end we reformat and drop countries that a firm has no relation to at all.

*After the construction of the weights, we need to merge them together for the simulation and reformat, otherwise the structure does not make sense.
*Countries where a firm has only applied for one technology but not for another get 0 weights for the 0 technology in that year.
*In the end we have a dataset at the firm-year-country-technology level with a column "weight" that encompasses weights 
*for both technologies with only countries that are assigned some weight for either technology.

*0. making weights
do ${d}code/config/country_list.do
global ctrylistX = "$invtcountrylist"


*the datasets in here get very large and take very long to merge after the interpolation. 
*we make a cutoff at 1980 to limit this, 1980 is reasonable in our opinion as the patent data becomes much more sparse pre 1980.
global firstyear = 1980

*weight construction
foreach division in auto95 pauto95 {
    *load the firm-inventorcounts
    use ${d}datasets/macrosim/bvd_year_inventor_count_`division'_bia.dta, clear

    *go through the interpolation process to get a count for all firms and years
    drop if year == 9999
    keep if year >= $firstyear
    keep if year <= 2011

    *generate non-string identifier (required for the xtset command)
    egen lse_id = group(BvD)
    preserve
        keep BvD lse_id
        duplicates drop BvD, force
        tempfile BvD_lookup
        save `BvD_lookup'
    restore

    *generate missing years for each firm
    xtset lse_id year
    tsfill, full

    *get back our BvD identifier
    drop BvD
    mmerge lse_id using `BvD_lookup'

    *Interpolate inventorcounts per country for the newly inserted years. Based on closest year for that firm country combination.
    foreach ctry in $ctrylistX {
        bys lse_id: mipolate nb_pat_invt_`ctry' year, gen(nb_pat_invt_`ctry'2) nearest
        drop nb_pat_invt_`ctry'
    }

    *generating weights
    egen total_pat = rowtotal(nb_pat_invt_??2), missing
    foreach ctry in $ctrylistX{
        gen share_invt_1995_`ctry' = nb_pat_invt_`ctry'2 / total_pat
    }

    *reshape and drop missing values
    keep year BvD share*
    drop if missing(BvD)
    drop if missing(year)
    sort BvD
    
    reshape long share_invt_1995_, string i(BvD year) j(ctry)
    ren share_invt_ weight
    drop if weight == 0
    drop if missing(weight)
    save ${d}datasets/macrosim/bvd_ctry_year_inventor_weights`division'nearestsep_long.dta, replace

    *genearte a country list per firm, to be used later in (2.)
    preserve
        keep BvD ctry 
        duplicates drop
        save ${d}datasets/macrosim/bvd_ctry_`division'_nearestsep.dta, replace
    restore
}


*1.now we need to generate a list of all firms and years for firms that have either a pauto95 or a auto95 patent.
*2.and then a list of firms countries. 
*3.then we need to merge together before merging the weights

*1 firm year combinations that have a pauto or auto patent
keep BvD year
duplicates drop

* we merge the two datasets together to get a full set of BvD year combinations. 
*Reminder: the loop uptop finishes with pauto95 loaded, so we merge auo95 in.
*This here can be done in two ways. Either with unmatched (both or none). The latter if we only want firms that 
*have both a patent in pauto and auto. Fhe former if we want all firms that have a patent in either technology.
mmerge BvD year using ${d}datasets/macrosim/bvd_ctry_year_inventor_weightsauto95nearestsep_long.dta, unmatched(both)
keep BvD year
duplicates drop
tempfile bvd_year_lookup
save `bvd_year_lookup', replace

*2 firm country combinations, irrespective of technology or if a firm has both technologies (use the files created in the loop from 0)
use ${d}datasets/macrosim/bvd_ctry_pauto95_nearestsep.dta, clear
append using ${d}datasets/macrosim/bvd_ctry_auto95_nearestsep.dta
duplicates drop
tempfile bvd_ctry_lookup
save `bvd_ctry_lookup', replace

*3 merging together to get a complete set of year country combinations per firm, for firms that have either types of patents
use `bvd_year_lookup', clear
mmerge BvD using `bvd_ctry_lookup', unmatched(none) 
drop _m
sort BvD year ctry

*now we should have a set of BvD year ctry combinations that have a pauto or an auto patent at some point in the timeframe.
*we still need to merge the weights together in one long dataset, 
*at the end of this, every firm should have weight per year and country combination.
mmerge BvD year ctry using ${d}datasets/macrosim/bvd_ctry_year_inventor_weightsauto95nearestsep_long.dta, unmatched(master)
ren weight weight_auto95
drop _m
mmerge BvD year ctry using ${d}datasets/macrosim/bvd_ctry_year_inventor_weightspauto95nearestsep_long.dta, unmatched(master)
ren weight weight_pauto95
drop _m

*now we may have missing weights for countries that may only be present in pauto or auto patents but not in the other
*and weights for country-year combinations (per firm) that have neither weights (no technolgy by an inventor from that country in that year)
*we decide to replace the former ones with 0 and drop the latter ones to reduce storage space needed.
bys BvD year ctry: gen issue_miss = 0
replace issue_miss = 1 if missing(weight_auto95) & missing(weight_pauto95)
drop if issue_miss == 1
drop issue_miss
replace weight_auto95 = 0 if missing(weight_auto95)
replace weight_pauto95 = 0 if missing(weight_pauto95)

*now we have a full set fo weights. But we still need them as BvD year ctry division format
*delete one weight, save first set, restore and delete other weight,
*generate division, reappend first set, set division for the first set of weights, encode division
preserve
keep BvD year ctry weight_auto95
ren weight_auto95 weight
tempfile auto95_weights
save `auto95_weights', replace
restore

keep BvD year ctry weight_pauto95
ren weight_pauto95 weight
gen division = "pauto95"
append using `auto95_weights'
replace division = "auto95" if division == ""
encode division, gen(division_encoded)

*checking the encode has the right levels. auto95 should be 1, pauto95 should be 2
label list division_encoded
assert division_encoded == 1 if division == "auto95"

drop division
ren division_encoded division
order BvD year ctry division
sort BvD year ctry division

*datacheck. do weights sum to 1 per firm division year? zero is also ok.
*careful, asserts for precise values will fail, due to rounding errors of the weights. we have a bunch that are .99999999
*also check for duplicates, should not drop any observations here !
bys BvD year division: egen sum_weight = total(weight)
gen check_1 = 0
replace check_1 = 1 if sum_weight >= .999999 & division == 1 | sum_weight == 0 & division == 1 
replace check_1 = 1 if division == 2

gen check_2 = 0
replace check_2 = 1 if sum_weight >= .999999 & division == 2 | sum_weight == 0 & division == 2
replace check_2 = 1 if division == 1

assert check_1 == 1
assert check_2 == 1
drop sum_weight check_1 check_2
duplicates drop
drop if missing(weight)

save ${d}datasets/macrosim/bvd_ctry_year_inventor_weightsdivnearestsep_long.dta, replace
